Preference for response-contingent vs. free reinforcement
نویسندگان
چکیده
منابع مشابه
Model-Free Preference-Based Reinforcement Learning
Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action sp...
متن کاملContingent Features for Reinforcement Learning
Applying reinforcement learning algorithms in real-world domains is challenging because relevant state information is often embedded in a stream of high-dimensional sensor data. This paper describes a novel algorithm for learning task-relevant features through interactions with the environment. The key idea is that a feature is likely to be useful to the degree that its dynamics can be controll...
متن کاملA Comparison of Noncontingent Plus Contingent Reinforcement to Contingent Reinforcement Alone on Students’ Academic Performance
Noncontingent reinforcement (NCR) can be described as time-based or response-independent delivery of stimuli with known reinforcing properties. Previous research has shown NCR to reduce problem behavior in individuals with developmental disabilities and to interfere with the acquisition of more desired alternative behavior. To date, however, little research has examined the effects of NCR on ch...
متن کاملPreference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...
متن کاملPreference-based Reinforcement Learning
This paper investigates the problem of policy search based on the only expert’s preferences. Whereas reinforcement learning classically relies on a reward function, or exploits the expert’s demonstrations, preference-based policy learning (PPL) iteratively builds and optimizes a policy return estimate as follows: The learning agent demonstrates a few policies, is informed of the expert’s prefer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of the Psychonomic Society
سال: 1977
ISSN: 0090-5054
DOI: 10.3758/bf03329300